The 2nd take-home exercise for ISS608, this time we aim to build upon the population pyramid built in Take-Home Exercise 01.| I have created 2 seperate pyramid-based visualizations, one static visualization that allows for comparison of 2 pyramids, each of a different planning area. The other visualization is a GIF that shows changes in the pyramid over time.
Exercise Task and Goals 1. Breakdown the demographic data into Planning Areas. 2. Apply appropiate animation or interactivity methods to the population pyramid visualization.
Installing and running the packages
For this exercise, we will require additional packages that will allow us to implement animation and interaction techniques. The code chunk below is for the loading of required packages.
packages = c('tidyverse','readxl','ggiraph','plotly','gganimate','DT','patchwork','gifski','gapminder','lemon', 'dplyr')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Read CSV
For this exercise, we will be using data for Singapore’s demographic breakdown by planning area from 2000 to 2020. The base csv files were obtained from Singapore’s Department of Statistics and are extracted into 2 data tables
demo1 <- read_csv('data/respopagesextod2000to2010.csv')
demo2 <- read_csv('data/respopagesextod2011to2020.csv')
Data Cleaning and Manipulation
After reading the csv files, we will now proceed to do data cleaning and manipulation into the required form for our visualization
1. Combining the datatables
first we combine the 2 datatables together into 1.
joined_demo <- rbind(demo1, demo2)
2. Aggregating the data
As we are required to show the population pyramid by Planning Area and changes across time, for our aggregation, we will include the Time and PA columns into our group_by function unlike in Take-Home Exercise 1.
agg_pop <- joined_demo %>%
group_by(AG, Sex, Time, PA) %>%
summarise(Pop = sum(Pop)) %>%
ungroup()
3. Necessary edits for visualization
Then we will be changing all age groups to double digits in order to ensure that they are arranged properly in the visualization
agg_pop$AG[agg_pop$AG=="5_to_9"] <- "05_to_09"
agg_pop$AG[agg_pop$AG=="0_to_4"] <- "00_to_04"
Finally, we will convert all Pop values of male to negative so that they will appear on the left side of the pyramid(s).
agg_pop$Pop <- ifelse(agg_pop$Sex == "Males",-1*agg_pop$Pop,agg_pop$Pop)
We will then check if there any any missing data.
Creating the Visualization
1. Changes to Planning Area Demographics over 20 years As there are multiple planning areas in Singapore, we will create a function which can create an animated population pyramid for us when we enter the correct planning area.
In the code chunk shown below, the code lines up to ‘coord_flip()’ is for the creation of the basic pyramid. The ‘transition_time’ function will make the visualization cycle through the Population values according to the column ‘Time’.
Finally the last line of code is to animate the visualization using ‘gifski_renderer’ to create a gif that lasts 10 seconds.
create_gif <- function(PAselect){
filter_pop <- filter(agg_pop, PA == PAselect)
P <- ggplot(filter_pop, aes (x = AG, y = Pop/1000, fill = Sex)) +
geom_bar(data = subset(filter_pop, Sex == "Females"), stat = "identity") +
geom_bar(data = subset(filter_pop, Sex == "Males"), stat = "identity") +
scale_y_continuous(labels = abs) +
labs(
title = paste("Population Pyramid for",PAselect,"2000 - 2020\n\n Year: {as.integer(frame_time)}"), x = "Age Group", y = "Population in thousands"
) +
coord_flip() +
transition_time(Time)+
ease_aes('linear')
animate(P,fps = 24,duration = 10, renderer = gifski_renderer())
}
To know which is the planning areas listed in the data, a new table has been created with just the list of planning areas.
planning_areas <- agg_pop %>%
distinct(agg_pop$PA)
colnames(planning_areas) <- c("Planning Area")
head(planning_areas,10)
# A tibble: 10 x 1
`Planning Area`
<chr>
1 Ang Mo Kio
2 Bedok
3 Bishan
4 Boon Lay/Pioneer
5 Bukit Batok
6 Bukit Merah
7 Bukit Panjang
8 Bukit Timah
9 Central Water Catchment
10 Changi
A gif sample of the visualization function is created for the Hougang planning area as shown below
create_gif("Bishan")
2. Visualization to comparing population pyyramids of 2 planning areas
Next we will use plotly to create a diagram that will allow us to compare the pyramids for 2 different planning areas in the same year.
First we will create 2 temporary data tables filtered according to the entered information, with 1 data table for each Planning Area. Following that we will combine them into the subplot under P and make the necessary labelling adjustments before completing the function with return(P)
create_plot <- function(PA1, PA2, Year){
d <- highlight(agg_pop)
filter_pop1 <- filter(d, PA == PA1, Time == Year)
filter_pop2 <- filter(d, PA == PA2, Time == Year)
P1 <- ggplot(filter_pop1, aes(x = Pop, y = AG, fill = Sex)) +
geom_col()+
scale_x_symmetric(labels = abs)
P2 <- ggplot(filter_pop2, aes(x = Pop, y = AG, fill = Sex)) +
geom_col()+
scale_x_symmetric(labels = abs)
P <- subplot(ggplotly(P1) %>% layout(annotations = list(x = 0.025 , y = 1.05, text = PA1, font = list(size = 12), showarrow = F,
xref='paper', yref='paper'),
showlegend = FALSE),
ggplotly(P2) %>%
layout(annotations = list(x = 0.975 , y = 1.05, text = PA2,font = list(size = 12), showarrow = F,
xref='paper', yref='paper'),
showlegend = TRUE), nrows = 1, margin = 0.1, shareY = TRUE, titleX = TRUE)
P <- P %>%
layout(title = list(text = '<b>Population Pyramids Comparison</b>', x = 0.5, y = 5 , font = list(size = 15)),yaxis = list(title = "Age Group"))
return(P)
}
For this example, we will comparing Ang Mo Kio and Hougang in the Year 2010
create_plot("Ang Mo Kio","Hougang", "2010")
Conclusion
There are more interactive techniques that can be performed such as creating a dashboard and shiny to allow for checkbox selection of variables to be displayed and these 2 only serve as some of the most basic of visualization enhancements that can be performed.